Improved location features for meeting speaker diarization
نویسنده
چکیده
This paper proposes several improvements to the correlationbased location features recently used in meeting speaker diarization. A speech-specific alternative to the generalized cross correlation phase transform (GCC-PHAT) algorithm is tested and shown to provide equal or better results without noise reduction or continuity-enforcing smoothing. The limitations of a single correlation reference waveform are discussed, and it is shown how a multi-band energy ratio feature can help overcome them, yielding significantly improved performance. An all-pairs correlation is also proposed, and when combined with energy ratios, it also improves upon the baseline system. However, the best combination is the baseline correlation features with energy ratios.
منابع مشابه
Speaker diarization of spontaneous meeting room conversations
Speaker diarization is the task of identifying “who spoke when” in an audio stream containing multiple speakers. This is an unsupervised task as there is no a priori information about the speakers. Diagnostical studies on state-of-the-art diarization systems have isolated three main issues with the systems; overlapping speech, effects of background noise and speech/nonspeech detection errors on...
متن کاملTwo's a crowd: improving speaker diarization by automatically identifying and excluding overlapped speech
We present an update to our initial work [1] on overlapped speech detection for improving speaker diarization. Specifically, we describe the addition of new features and feature warping techniques that improve segmenter and, consequently, diarization performance. We also demonstrate improved diarization performance by additionally using overlap segment information in a new diarization pre-proce...
متن کاملIntegration of TDOA features in information bottleneck framework for fast speaker diarization
In this paper we address the combination of multiple feature streams in a fast speaker diarization system for meeting recordings. Whenever Multiple Distant Microphones (MDM) are used, it is possible to estimate the Time Delay of Arrival (TDOA) for different channels. In [9], it is shown that TDOA can be used as additional features together with conventional spectral features for improving speak...
متن کاملInformation Bottleneck Features for HMM/GMM Speaker Diarization of Meetings Recordings
Improved diarization results can be obtained through combination of multiple systems. Several combination techniques have been proposed based on output voting, initialization and also integrated approaches. This paper proposes and investigates a novel approach to combine diarization systems through the use of features. A first diarization system, based on the Information Bottleneck, is used to ...
متن کاملImproved Overlapped Speech Handling for Speaker Diarization
We present our ongoing work in addressing the issue of overlapped speech in speaker diarization through the use of overlap segmentation, overlapped speech exclusion, and overlap segment labeling. Using feature analysis, we identify the most salient features from a candidate list including those from our previous system and a set of newly proposed features. In addition, through independent optim...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007